feat(otel): add token breakdown attributes to conclusion spans#26121
feat(otel): add token breakdown attributes to conclusion spans#26121
Conversation
Read agent_usage.json in sendJobConclusionSpan and emit gh-aw.tokens.input, gh-aw.tokens.output, gh-aw.tokens.cache_read, and gh-aw.tokens.cache_write span attributes. Closes #<issue> Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5f05f55a-111f-459e-9aa5-34bca00d4a14 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds per-token-type breakdown attributes to the job conclusion OTLP span by reading the existing /tmp/gh-aw/agent_usage.json, enabling more detailed observability (cache hit rate, per-type cost attribution, and alerting) than the current gh-aw.effective_tokens aggregate alone.
Changes:
- Enriches
sendJobConclusionSpanwith four new conditional span attributes: input/output/cache read/cache write tokens. - Updates
sendJobConclusionSpanJSDoc to document the additional runtime file. - Adds a focused test suite covering presence/absence, zero-value omission, and invalid JSON handling for
agent_usage.json.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/send_otlp_span.cjs | Reads agent_usage.json and conditionally emits gh-aw.tokens.* attributes on conclusion spans. |
| actions/setup/js/send_otlp_span.test.cjs | Adds tests validating token breakdown enrichment behavior in conclusion spans. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 0
🧪 Test Quality Sentinel ReportTest Quality Score: 83/100✅ Excellent test quality
Test Classification Details
Flagged Tests — Requires ReviewNo tests require immediate review. One non-blocking observation is noted below. i️ Test inflation ratio (informational)
Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 83/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 4 new tests are behavioral contracts that verify observable OTLP span attributes under distinct scenarios (happy path, absent file, zero-value filtering, invalid JSON). No coding-guideline violations detected.
sendJobConclusionSpanemitted onlygh-aw.effective_tokens— a cost-weighted aggregate — while the full per-type breakdown already existed in/tmp/gh-aw/agent_usage.json(written byparse_token_usage.cjs) but was never read.Changes
send_otlp_span.cjs: After thegh-aw.effective_tokensblock, readsagent_usage.jsonvia the existingreadJSONIfExistshelper and conditionally pushes four new span attributes (only when value > 0):gh-aw.tokens.inputgh-aw.tokens.outputgh-aw.tokens.cache_readgh-aw.tokens.cache_writesend_otlp_span.test.cjs: Newdescribe("token breakdown enrichment in conclusion span")block covering: all-fields present, file absent, zero-value omission, and invalid JSON — matching the pattern used by the rate-limit enrichment tests.These attributes enable cache-hit-rate panels (
cache_read / (cache_read + cache_write)), per-type cost attribution, and fine-grained threshold alerts without requiring step summary HTML.